Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix updates for objects with housenumbers #773

Merged
merged 4 commits into from
Feb 24, 2024

Conversation

lonvia
Copy link
Collaborator

@lonvia lonvia commented Feb 24, 2024

There has been a long-standing issue that updates of places with housenumbers as well as housenumber interpolation objects do not work properly. These places are added to the Photon database with a special database ID <place_id>.<housenumber> in order to allow multiple Photon objects for the same Nominatim place_id. This works fine on import but goes subtly wrong when doing updates, because update have only the information about the new state of a place, not the old one. Thus, it is not really possible to delete the old data for such a place because we don't know what database ID to look it up under.

This PR changes the database ID for such objects to <place_id>.<seq_nr>. When a place is inserted that needs to be exploded into multiple Photon documents with different housenumbers, they are simply assigned with a sequential ID. As a place is always updated as a whole, we can now simply delete all documents matching the pattern <place_id>.<seq_nr> by sequentially checking if there is such document in the database. If there is, delete it, if not, stop the entire process.

The change do not modify the database schema, so the code happily works with older database dumps. Only when you want to make use of the fixed update function, then you need to start off with a new dump created by this new code or you will see duplicate housenumbers creep into your database.

Tried on a planet to update the database and was able to catch up on OSM data at a rate of about 1day/hour (updating both, the Nominatim DB and the Photon DB). This should be sufficient performance-wise.

The PR also finally adds tests for the update process and fixes an off-by-one error in the handling of new-style interpolations.

Instead of creating a document ID that consists of place ID and
housenumber, create an artificial sub ID for each document.
A single place ID is always updated as a batch. Thus if the sub IDs
are assigned sequentially, it is possible to find all documents belonging
to a given place ID by simply iterating over sub IDs until there is
no document anymore.
@lonvia lonvia merged commit ac66094 into komoot:master Feb 24, 2024
4 checks passed
@lonvia lonvia deleted the fix-interpolation-updates branch February 24, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant